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Abstract 

Background: Leukocoria is defined as a white reflection and its manifestation is symptomatic of several ocular 
pathologies, including retinoblastoma (Rb). Early detection of recurrent leukocoria is critical for improved patient 
outcomes and can be accomplished via the examination of recreational photography. To date, there exists a paucity 
of methods to automate leukocoria detection within such a dataset. 

Methods: This research explores a novel classification scheme that uses fuzzy logic theory to combine a number of 
classifiers that are experts in performing multichannel detection of leukocoria from recreational photography. The 
proposed scheme extracts features aided by the discrete cosine transform and the Karhunen-Loeve transformation. 

Results: The soft fusion of classifiers is significantly better than other methods of combining classifiers with 
p = i .1 2 x 1 0~ 5 . The proposed methodology performs at a 92% accuracy rate, with an 89% true positive rate, and an 
1 1 % false positive rate. Furthermore, the results produced by our methodology exhibit the lowest average variance. 
Conclusions: The proposed methodology overcomes non-ideal conditions of image acquisition, presenting a 
competent approach for the detection of leukocoria. Results suggest that recreational photography can be used in 
combination with the fusion of individual experts in multichannel classification and preprocessing tools such as the 
discrete cosine transform and the Karhunen-Loeve transformation. 

Keywords: Leukocoria, Retinoblastoma, Fuzzy logic, Soft computing, Discrete cosine transform, Karhunen-Loeve 
transform 



Background 

Leukocoria is an abnormal pupillary light reflex that is 
characterized by a persistent 'white-eye' phenomenon 
during visible light photography. It is often the primary 
observable diagnostic symptom for a range of catastrophic 
ocular disorders. In addition, leukocoria is a prevailing 
symptom of congenital cataracts, vitreoretinal disorders 
and malformations, retinopathy of prematurity, trauma- 
associated diseases, Coats' disease, ocular toxocariasis, 
Norrie disease, ciliary melanoma, retrolental fibroplasia, 
and retinal hamartomas [1,2], see [3] for a review. In chil- 
dren under the age of 5, however, the predominant cause 
of leukocoria is Rb [4,5]. 

In the case of Rb, tumors in the eye can act as diffuse 
reflectors of visible light [6-9]. Consequently, leukocoria 
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associated with Rb is a progressive symptom that occurs 
more frequently, during recreational photography, as the 
size and number of tumors increase [10]. The fact that 
it occurs in recreational photography opens the door to 
investigate a way to perform an automatic assessment 
of visual dysfunction [11]. Leukocoria is optically dis- 
tinct from specular reflections of the cornea and can be 
detected with a low resolution digital camera, a camera 
phone equipped with or without a flash, or with a dig- 
ital video recorder. In clinical settings, the "red reflex" 
test is adequate for the identification of tumor reflections 
when administered by trained clinicians, but may suffer 
from a high degree of false negatives when conducted 
under a wide range of conditions [12,13]. This ineffec- 
tiveness of the "red-reflex" test is especially problematic 
in developing nations where there is a limited supply of 
properly trained specialists in ophthalmology or pedi- 
atrics. Even in developed nations, recent studies suggest 
that clinicians are either improperly trained for leukocoric 
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screening, or do not perform the test [14]. Indeed, par- 
ents or relatives are generally the first individuals to detect 
leukocoria in a child, and their observation often initi- 
ates diagnosis [1,4,15-17]. For example, in a study of 1632 
patients with Rb, the eventual diagnosis in ~80% of cases 
was initiated by a relative who observed leukocoria in a 
photograph [4]. 

The consequences of a false negative finding can be 
profound, as the case of Rb illustrates. While it only 
comprises 3-4% of pediatric cancer, the incidence of 
Rb is high enough (i.e., ~ 1-2:30,000 live births) to 
mandate universal screening [4,13]. The median age of 
diagnosis is 24 months for unilateral disease and 9-12 
months for bilateral disease [18,19]. When detected 
early, Rb is curable, either by enucleation of the eye, or 
the use of ocular salvage treatments with chemother- 
apy and focal treatments or radiation therapy [20,21]. 
Delays in diagnosis lead to increased rates of vision 
loss, need for therapy intensification (with its associ- 
ated life-time toxicity) and death, particularly for chil- 
dren who live in resource-poor settings [7]. Compressing 
diagnostic time frames rely, in part, on improved meth- 
ods for detecting intraocular tumors or their leukocoric 
presentation. 

The autonomous and semi-autonomous analysis of 
diagnostic medical images, such as those mediated by 
computational biology and machine learning, are rou- 
tinely used for the unsupervised and supervised pre- 
diction and prognosis of numerous pathologies and 
pathology outcomes, but have had limited application 
in areas of detection and diagnosis [22,23]. In applica- 
tions where machine learning has been applied to the 
discernment of disease based on image data (analogous 
to the observable detection of leukocoria in digital pho- 
tographs), there has been significant success. These pre- 
vious studies have employed a variety of soft computing 
techniques: support vector machines (SVMs), Bayesian 
statistical approaches and neural networks have been used 
to assist in the detection of breast cancer in mammo- 
grams [24], prostate cancer [25], lung cancer [26] and 
cervical cancers [27]. Of particular importance has been 
the successful use of neural networks for the detection of 
skin cancers, such as melanoma, where non-histological 
photographic digital images serve as the medium 
[28-31]. In each of these scenarios, however, studies 
have been applied to controlled environments where 
skilled technicians intentionally seek to classify disease 
states. 

In spite of the apparent symptomology and recent suc- 
cesses in categorization [10], the automated or semi- 
automated detection of leukocoria remains a naive 
process. Therefore, this paper proposes a classification 
algorithm that detects a leukocoric eye using images (see 
Figure 1) processed to automatically detect faces and the 



position of the eyes [32], regions of interest, i.e., both 
eyes, and, finally, an individual class for each eye using 
a soft fusion of multiple classifiers to produce optimal 
results. The essential property of soft fusion of classi- 
fiers is the use of fuzzy integrals as a similarity mea- 
sure [33,34]. While still a very active area of research 
[35,36], the fusion of multiple classifiers based on sup- 
port vector machines, neural networks, and discrimi- 
nant analysis has had success, such as the classification 
of bacteria [37], handwriting images [38], credit scores 
[39], and remote sensing [40]. Here, we demonstrate that 
this approach is a significant improvement over alter- 
native methods of machine learning-enabled leukocoria 
detection. 

Methods 

Ethics statement 

This study was determined to be exempt from review 
by an Institutional Review Board at Baylor University. 
The parents of the study participants have given written 
informed consent to use and publish unaltered images of 
faces. 

Database and feature extraction 

This research uses a database of digital images corre- 
sponding to the eyes of 72 faces, for a total of 144 eye 
images. This database is strictly an internal collection 
of images produced by the authors of this paper, conse- 
quently, no external permission is required. To the best 
of our knowledge, there are no other databases for this 
task. 

Out of the 144 eye images, 54 eyes are labeled as 
"leukocoric" while the remaining 90 are labeled as 
"healthy". This implies that the database is unbalanced 
with 37.5% being the positive class and 62.5% the neg- 
ative. The size of each image varies being 19 x 19 the 
smallest size and 138 x 138 the largest. Orientation, angle, 
and rotation of each eye varies from image to image. The 
database includes faces with different skin and iris color. 
Illumination is not controlled and varies depending of the 
distance between the face and the flash of the camera. 
Also, different cameras were used to build the database. 
Figure 2 depicts several example images from the 
database. 

Figure 2 shows samples for the two classes and illus- 
trates the challenges mentioned above. These chal- 
lenges demand a pre-processing strategy that reduces 
the effect of random factors in the acquisition process. 
We use the strategy explained herein and presented in 
Figure 3. 

First, the input image is cropped to contain only the 
M x N image of the circumference delimited by the iris. 
This process can be done either manually or automatically. 
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Figure 1 Process of classification of two input images. 



Secondly, the cropped M x N three-channel (RGB) 
image, denoted as I(«i, «2> «3)> where «i € {0, . . . ,M — 1}, 
«2 S {0, . . . , N — 1}, and «3 e {0, 1, 2}, is separated into 
three different gray-scale images, I/?(«i,«2)> Ig( w i<«2)> 
andlfi(«i,«2)- 



The next step leverages 2D-DCT to alleviate the prob- 
lem of variant illumination in all three channels. For 
an image I(«i,«2) of size M x N, we can determine a 
matrix Fi(/q, fe) also of size M x N that contains all the 
spatial frequency components of the image, for k\ e 




Figure 2 Sample images from the experimental database. 
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Figure 3 Proposed image pre-processing strategy and feature extraction for the detection of leukocoria. 



{0, . . .,M - 1} and k 2 e {0, . . . ,N - 1}. The matrix Fi can 
be computed with the 2D-DCT in the following manner: 

Fi(/ci,/c 2 ) = F{l{n lt n 2 )} 
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According to [41], discarding the first three coefficients 
of Fi(/q,/c2) will counter the variation of illumination 
within the image. That is, an altered frequency domain 
matrix Fi(/ci,/c2) is created by discarding the elements in 
the coordinates (k\ = 0, k 2 = 0), (ki = 0, k 2 = 1), and 
(k\ = l,k 2 = 0) of Fi. After discarding the first three 
DCT coefficients, Fi is inversely transformed from the 
frequency domain to the spatial domain as follows: 



l(n l ,n 2 ) = F- l ^ l (k l ,k 2 )\ 



M-l N-l 

£ £a(*i)«(fc)Fl(*i,*a) x 
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and a(-) is also computed 



Fourth, each image I is then down-sampled or up- 
sampled to a fixed size of 32 x 32. The selection of this 
particular size was determined experimentally, training 
several classifiers using different image sizes and choosing 
the size that produced the smallest classification error in 
the average case, which was 32 x 32. Note that this is a 
very small resolution compared to the natural resolution 
of recreational photographs. 

Fifth, we z-score (subtract the mean and divide by the 
standard deviation) for each channel. The purpose is to 
have a dataset approximating a Af(0, 1) distribution at 
each channel. That is, having a dataset that follows a nor- 
mal distribution with zero mean and unit variance at each 
channel. In order to determine the mean and standard 
deviation for z-scoring we only make use of all images 
available for training, i.e., the training dataset. Images in 
the testing dataset will require the estimated mean and 
standard deviation estimated for the training dataset. We 
define I as the image I that has been processed by up- 
sampling or down-sampling, subtraction of a mean image, 
and division by a standard deviation. 

Finally, the Karhunen-Loeve Transform (KIT) is applied 
to the data using only the two eigenvectors whose cor- 
responding eigenvalues are the largest of all [42,43]. 
This procedure is analog to dimensionality reduction 
using Principal Component Analysis (PCA). Experimental 
research determined that the minimum number of eigen- 
vectors that can be used without loss of generalization is 
two. We define x, as a two-row vector defining the i-th 
eye image transformed using the KIT; that is, x = 7~{I}, 
where T{-} denotes the KIT. Therefore, the transformed 
training set per each individual channel is defined as T> = 
{xi,di}fL v where x/ G R 2 , di e {—1,1} is the desired 
target class corresponding to the «-th vector (indicating 
normal or leukocoric), and N indicates the total number 
of training samples. Then, the training set T> is used in 
the design of classifiers, which is explained in the next 
section. 
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Classification architecture 

The proposed classification scheme involves the fusion of 
different classifiers that are known to perform well indi- 
vidually. The purpose of the fusion is to achieve better per- 
formance than with individual classifiers [44]. The fusion 
of classifiers is also known as "combination of multiple 
classifiers" [45], "mixture of experts" [46], or "consensus 
aggregation" [47]. This paper uses fuzzy logic to combine 
different classifiers using the method proposed in [33,34]. 
A fuzzy integral conceptualizes the idea of the method 
along with Sugeno's g^-fuzzy measure [48]. The differ- 
ent classifier performances define the importance that the 
fusion method will give to each classifier. We propose 
having nine different classifiers per channel, as shown in 
Figure 3. The total number of classifiers is 27. We per- 
form the analysis of each channel aiming to observe which 
channel performs better and to determine its contribution 
to correct classification in further studies. A final class is 
given considering each classifier's output at each channel. 
The following paragraphs explain the fusion methodology. 

Soft fusion of classifiers 

Revisiting [33] and [48] we have that a set function g : 2 y 
m> [0, 1] is called a fuzzy measure if 1) g(0) = 0, g(y) = 1, 
2) g(A) < g(B) ifAcB, and 3) if is an increas- 

ing sequence of measurable sets, then lim^oo g(Ai) = 
g (lim^oo Ai). This can be used to define the following 
equality: 



g(A UB) = g(A) + g(B) + Xg(A)g(B), 



(4) 
-1, 



which is known as the^-fuzzy measure, for some X > 
all A,B Cx, and An B = tf. 

If we consider y as a finite set and h : y i-^[0, 1] as 
a fuzzy subset of y, then, the fuzzy integral over y of 
the function h w.r.t. a fuzzy measure g can be defined as 
follows: 



h(y) og(-) 



max 



mmh(y),g(£) 



= max [min(>,g(C t ))l , 

tc[o,l] 



(5) 



where Ct = \y\h(y) > t}. The equality in Equation 5 
defines the agreement between the expectation and the 
evidence. 

Particularly, let y define a finite set containing the out- 
puts of n classifiers, that is, y = {y\,yi, ■ ■ ■ ,y n }. Let h : 
y i — > [ 0, 1] be a function that tells the certainty of a clas- 
sifier's output to belong to a given class (i.e. provides the 
"evidence"). Then, order the classifiers according to their 
current classification certainty, such that h{y\) > h(j2) > 
■ ■ ■ > h(y„). Then it follows to define the fuzzy integral e 
w.r.t. a fuzzy measure g over y as follows: 



e = max [min (h(yi),g(Ai))] , 



where Ai = [yi, yi, . . . , yi). Furthermore, since g is a gx- 
fuzzy measure, each value for g(Ai) can be computed 
using the following recursive equation: 



g(Ad 



g({yi})=g 1 fori 

g'+giA-!) 

+x g l g(A-i) 



i, 



for 1 < i < n, 



(7) 



where X is the unique root greater than —1 that can be 
obtained solving the following polynomial: 



X + 1 = fj (1 + Xg l ) , 



(8) 



where X e (— l,+oo) and X ^ 0. However, in order 
to solve the polynomial, we need to estimate the densi- 
ties g ! (i.e., "the expectation"). The i-th density defines 
the degree of importance the z'-th classifier yi has in the 
final classification. This densities can be estimated by an 
expert, or defined using a training dataset. In this research 
we defined the densities using the performance obtained 
from the data, and the process of experimentation will 
be explained later. In the following subsection we discuss 
briefly the classifiers used in this research. 

Selection of classifiers 

We are using three different kinds of classifiers: artificial 
neural network (ANN) -based, support vector machines 
(SVM)-based, and discriminant analysis (DA)-based. The 
three ANN-based classifiers we use for each channel have 
the same Feed-Forward (FF) architecture [49]; the differ- 
ence lies in the number of neurons in each hidden layer. 
The two outputs of each neural network have softmax 
activation functions; the goal is to train the neural net- 
works to approximate probability density functions of the 
problem and output the posterior probabilities at the out- 
put layer. Thus the output layer's activation functions, 
softmax, act as the function h that maps the output of the 
classifier to values in the range [0, 1] indicating classifi- 
cation certainty for either class. We used a partial subset 
of data and started training with three different groups: 
networks that randomly have between a) 2-5 neurons, 
b) 6-25 neurons, and c) 26-125 neurons. After a large 
number of experiments we concluded that the three best 
architectures were those shown in Table 1. The selection 
was performed based on those networks whose balanced 
error rate (BER) was the lowest in the average case. 

Table 1 Number of hidden neurons for each channel 



(6) 



Channel 


ANNi 


ANN 2 


ANN 3 


Red 


2 


20 


50 


Green 


3 


10 


15 


Blue 


2 


3 


5 
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E.g., consider the third row of Table 1; for the blue 
channel, the best three architectures were those with two, 
three, and five neurons in the hidden layer; in contrast, 
the red channel exhibited the lowest errors using two, 20, 
and 50 neurons in the hidden layer. Intuitively, one can 
conclude that the training data for both green and blue 
channels is much simpler to classify than the data for the 
red channel. 

Next, the SVM-based classifiers in this research are, by 
necessity, of the soft margin kind since the dataset has two 
non-linearly separable classes [50]. This research uses four 
SVMs; each has a different type of kernel function. The 
four SVM kernel functions are: 1) linear, 2) quadratic, 3) 
polynomial, and 4) radial basis function (RBF). 

An SVM with linear kernel is the simplest form of a 
soft margin SVM; in practice it only performs a dot prod- 
uct, leaving the data in the input space. SVMs with a 
quadratic kernel are a particular case of a polynomial 
kernel of second degree. An RBF kernel is a preferred 
choice in research that offers little or no information about 
the dataset properties. SVMs can be very powerful, but 
its effectiveness, however, is tied up to an appropriate 
selection of its model parameters, a.k.a. hyper-parameters 
[51]. The traditional soft-margin SVM requires a hyper- 
parameter usually known as "regularization" parame- 
ter, C, that penalizes data-points incorrectly classified. 
Then, depending on the kernel choice, SVMs may have 
additional hyper-parameters; e.g., the polynomial kernel 
requires a parameter p that defines the degree of the poly- 
nomial while the RBF kernel requires the parameter x 
which controls the wideness in an exponential Gaussian- 
like function. 

The typical method to find a "good" set of hyper- 
parameters is called "grid search", which some times can 
be computationally costly, especially if the data set is 
large. Thus, in order to accelerate the process of finding 
the hyper-parameters this research uses a quasi-optimal 
method to find the hyper-parameters based on optimiza- 
tion techniques [52]. The list of hyper-parameters used in 
our SVM-based classifiers appears in Table 2. The table 
shows the final values of C, p, and x for each channel and 
the particular kernel choice. In the case of SVMs based on 



Table 2 Kernel choice and parameters used with SVMs 







Kernel /f(x,-,Xj) = 






x, 


(xfx, 




e -^n*.-*/ii! 


Channel 




Quad. 


Poly. 


RBF 


Linear 


(C,T) 






p = 2 


P = 3 


Red 


C= 7 


C = 4 


C= 0.5 


(9, 0.5) 


Green 


C= 3 


C= 2 


C= 2 


(33,2) 


Blue 


C= 2 


C= 1 


C= 2 


(0.13, 0.5) 



a polynomial kernel with a variable degree, it was found 
that a third degree polynomial produced better results; 
this is shown in the fourth column of Table 2. 

The last choice of classifiers are based on discrimi- 
nant analysis. Both Linear Discriminant Analysis (LDA) 
[53] and Quadratic Discriminant Analysis (QDA) [54] are 
closely related and are well known in the community for 
their simplicity and the robustness provided by statisti- 
cal properties of the data. QDA and LDA achieve optimal 
results, in terms of probability theory, when the data in 
each class follows a Gaussian distribution independent 
and identically distributed (IID). Since this research uses 
the KLT, the data is close to being IID; however, the data 
is not actually IID, as in most real-life applications such 
as this research. LDA and QDA require no parameters 
except for the mean and covariance matrix estimates for 
each channel; these are computed from the training set T>. 
The experiments performed while training the classifiers 
and the soft fusion are discussed next. 

Experimental design 

The soft fusion of i classifiers for detecting leukocoria 
requires an estimation of each classifier's importance, i.e., 
the i-th density g'. This research defined each classifier's 
importance based on their individual performances using 
several different performance metrics and averaging the 
ranking in each individual metric. This section describes 
the experimental process of evaluating each classifier and 
the final value for g l density corresponding to the i-th 
classifier. 

Cross-validation 

The whole database of eye images contains 144 exam- 
ples. We divided the database into 10 groups of approx- 
imately equal size in order to use the well-known JC-fold 
cross validation (CV) technique. Cross validation helps 
the researcher get an estimate of true classification perfor- 
mances [55]. This research uses 10-fold CV (K = 10) in 
order to determine the true importance of each classifier. 

The database is divided in 10 groups of 14.4 data points 
in the average case. The methodology selects which points 
belong to each group randomly. Nine out of the 10 groups 
follow the pre-processing and feature extraction proce- 
dure explained earlier. Then the set of nine groups with 
its corresponding target classes di is defined as the train- 
ing dataset V = {x,-, di)f =v where X; € R 2 , di e {—1, 1}. 
Then, the 10th group (the one not used for training) is 
used as the testing set K = {xj, dj}^L v where N+M = 144. 
The process is repeated 10 times selecting a different com- 
bination of nine groups each time leaving the 10th out 
for testing. Finally, the performances obtained with each 
testing set are averaged. We ran 10-fold CV 100 times 
in order to have more meaningful results, averaging each 
instance of 100 CVs. This process reduces the uncertainty 
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that the CV method will choose nearly the same sets of 
data for the 10 groups. The following paragraph explains 
the performance metrics used to rank the classifiers. 

Performance metrics 

Let us define the z'-th difference yi — di as the z'-th "residual 
error", where yt is the actual output of the classifier when 
the testing set input vector X/ is presented at its input, for 
all {x;,<i;} € /C. Commonly, machine learning researchers 
use the following statistical metrics to quantify perfor- 
mance based on the residual error: Root Mean Squared 
Error (RMSE) and Normalized Root Mean Squared Error 
(NRMSE). Such metrics are defined as follows: 



RMSE = 



M 



\ 1=1 



NRMSE = 



1 M 



\ M ■ , 
\ i=i 



(9a) 



(9b) 



where a is the standard deviation of yt. 

From estimation theory it is known that if one has 
the residual error's expected value equal to zero, and a 
unit variance, one may have achieved the least-squares 
solution to the problem, either linear or non-linear. Fur- 
thermore, it is understood that as the variance of the resid- 
ual error approaches zero, the problem is better solved. 
Therefore, we want to measure both the expected value 
and the variance. Let us denote the expected value of the 
residual error ji E and the variance of the residual error 
<t s 2 = Var[ yi — di — fi £ ] and their sample-based estimators 
as follows: 



1 M 

fj, £ =E [yi - di] = j^Y^tyi- di) 



(9c) 



2 1 M 

a s = E bi - d i - He] = r^— ~ ^i ~ d i ~ Ve) 

(=1 



(9d) 



from where it is desired that both |/x e |,cr £ -> 0 as 
M co. 

On the other hand, some standard performance metrics 
for binary classification employ the well known confusion 
matrix. For binary classification, four possible predic- 
tion outcomes exist. A correct prediction is either a True 
Positive (TP) or a True Negative (TN), while an incor- 
rect prediction is either a False Positive (FP) or a False 
Negative (FN). Here 'Positive' and 'Negative' correspond 
to the predicted label of the example. 

From hereafter we denote TP as the total number of true 
positives, TN as the total number of true negatives, FP 
as the total number of false positives, and FN as the total 
number of false negatives in a classification event using 



a complete dataset, which in our case is the cross vali- 
dation set K. Such definitions allow us to use following 
performance metrics based on a confusion matrix: 



Accuracy = ACC = 
TP rate = TPR = 
FP rate = FPR = 
Specificity = SPC 



TP+ TN 



TP + FN + FP+ TN 
TP 



TP + FN 

FP 
FP+TN' 
TN 



FP+TN' 



Positive Predictive Value = PPV = 



TP 



Negative Predictive Value = NPV 
False Discovery Rate = FDR = 



TP + FP 
TN 
TN + FN' 
FP 



FP+TP 

Matthews Correlation Coefficient = MCC = 

TPxTN-FPx FN 

V '(TP + FP)(TP + FN)(TN + FP)(TN + FN) ' 

( TP \ ( TP \ 
\ TP+FP J \ TP+FN ) 



F\ -Score = 2 x 



/ TP 
I tp+fp i 



> ) ~ \ TP+FN ) 

Balanced Error Rate = BER = 

1 / FP FN \ 

2 \TN + FP + FN+TPJ' 



(9e) 
(90 

(9g) 

(9h) 
(9i) 
(9j) 

(9k) 

(91) 
(9m) 

(9n) 



Note that in the literature, one might also find the above 
measures with different names; e.g., TPR is also known 
as Sensitivity, SPC is also known as TN rate, PPV is also 
known as Precision, and the F\ -Score is also known as the 
F-Measure. 

In the literature, one can find other typical perfor- 
mance metric based on the area under Receiver Operating 
Characteristics (ROC) curve [56]. The area under the 
ROC curve, abbreviated AUC, provides a basis for judg- 
ing whether a classifier performs realistically better than 
others in terms of the relationship between its TPR and 
FPR. 

The last performance metric we use is the Cohen's kappa 
measure k. The k measure scores the number of correct 
classifications independently for each class and aggregates 
them [57]. This way of scoring is less sensitive to random- 
ness caused by a different number of examples in each 
class, therefore, it is less sensitive to class bias in training 
data. 

All the performance measures described in Equations 9a 
through 9n need to be interpreted according to a desired 
outcome. Table 3 shows all the performance metrics dis- 
cussed and their corresponding desired outcome; this will 
help interpret the results and rank the classifiers fairly 
well. 



Rivas-Perea etal. BMC Ophthalmology 2014, 14:1 10 
http://www.biomedcentral.eom/1471-2415/14/1 10 



Page 8 of 15 



Table 3 Performance metrics and their desired outcome 



Metric 


Interval or domain 


Desired 


RMSE 


R+ 


Thp c,m;a pet 1 1 ip 
1 1 IC 311 IqIICjL value. 


NRMSF 


jj+ 


Thp emallp^t \/sli IP 
1 1 It: il 1 la lit: j L Va 1 UC. 


i a i 


jj+ 


Thp Qmsllp^t \/sli IP 
i i ic ill la i ic j l vaiuc. 


°v 


R + 


Thp em^llpet \/sh ip 
l l IC il l id lie j L value. 


ACC 


rn ii 


One 


TPR 


rn ii 


One 


FPR 


rn ii 




SPC 


rn ii 


One 


PPV 


[0, 1] 


One. 


NPV 


[0,1] 


One. 


FDR 


[0,1] 


Zero. 


MCC 


[-1,1] 


One. 


F] -Score 


[0,1] 


One. 


BER 


[0,1] 


Zero. 


AUC 


[0,1] 


One. 


K 


[0,1] 


One. 



Results 

Tables 4, 5 and 6 show the average performance of each 
classifier over 100 experiments using different metrics. 
Each table ranks the classifiers on different color chan- 
nel data: red, green, and blue, respectively. The number 
in parenthesis defines the rank of a classifier for that par- 
ticular metric (in each row). A classifier ranked as "(1)" is 
the best among all the others, consequently, one ranked 
as "(9)" is the worst. The average rank of each classi- 
fier is shown in the last row of each table and this is 
used to determine the actual importance of each classi- 
fier. The 2-th density, g l , is computed using the following 
expression: 

ft = ^ do) 

where r/ is the average rank of each classifier and S r is the 
sum of all classifier ranks. In this manner, the sum of all 
densities is equal to one, which is desired [33]. 

From Table 4 we observe that for the red channel, the 
first three best ranked classifiers are LDA (DAiJ, and SVM 
with RBF kernel (SVM 4 ), and SVM linear (SVMi). Table 5 
shows that for the green channel, SVM with RBF ker- 
nel, SVM with polynomial kernel of third degree (SVM3), 
and LDA as the best ranked classifiers respectively. Sim- 
ilarly, Table 6 shows that for the blue channel, the SVM 
with polynomial kernel of third degree, SVM with RBF 
kernel, and SVM linear are the top three classifiers 
respectively. 



Soft fusion classification and comparison 

Finally, we can perform the soft fusion of classifiers using 
the densities found after performance analysis of the clas- 
sifiers. Since the densities, g', are now known, we can use 
Equation 8 to determine the appropriate value for k and 
then compute theg^-fuzzy measure using Equation 7 that 
allows us to compute the fuzzy integral (Equation 6). 

For comparison purposes we also use three of the most 
common combination methods: 1) Average, 2) Weighted 
Average, and 3) Majority. The Average method consists of 
averaging the classification of all classifiers and choosing 
the class closest to the average. However, the Weighted 
Average method takes into account the importance of 
each classifier as determined by the densities g l and mul- 
tiplies each classifier's output by its corresponding impor- 
tance; the products are added all together and the method 
decides for the class closest to the sum. In contrast, the 
majority method considers all classifiers equally relevant 
and takes a vote, deciding for class that agrees with the 
majority. Note that the Average and Majority methods 
produce the value for metrics based on classification error 
(such as Accuracy and TPR), but differ in metrics pro- 
ducing real values (such as RMSE). This is because the 
Average method uses real values output from the individ- 
ual models, while the Majority method uses voting. 

Table 7 shows the results of classification with the dif- 
ferent methods of combining classifiers. Note that these 
methods consider the information of all classifiers in all 
three channels and, thus, only one table is necessary. The 
next section introduces the analysis of these results. Note, 
however, that in the next section, the variables p and a are 
redefined and have the traditional meaning of statistical 
analysis and they shall not be confused with the variables 
p and a that, in the rest of the paper, represent a kernel 
parameter and a DCT scaling function, respectively. 

Discussion 

Table 7 shows that the proposed classification scheme per- 
forms better than the other three methodologies in most 
cases. The soft fusion of classifiers produces results that 
have less variability in the average case, as shown in the 
second-to-last row. 

The results in Tables 4, 5 and 6 clearly indicate that 
classifiers that use the green channel information perform 
better than those using blue or red channel information. 
Also, we can observe that the classifiers using red chan- 
nel information perform the worst of all. Therefore, we 
can argue that the most discriminant information is car- 
ried over the green channel and the information in the 
red channel may be introducing noise to the soft fusion 
of classifiers. Considering this possibility we compare 
the results of the best classifiers that use the informa- 
tion of the green channel against the proposed scheme, 
i.e., SVM with RBF kernel from Table 5 against the 



Table 4 Rank of red channel classifiers by performance analysis 





ANN, 


ANN 2 


ANN 3 




DA 2 


SVM, 


SVM 2 


SVM 3 


SVM 4 


RMSE 


1.180 (8) 


1.172 (7) 


1.221 (9) 


1.097(1) 


1.146 (6) 


1.103 (3) 


1.144 (5) 


1.124 (4) 


1.100 (2) 


NRMSE 


1.214(8) 


1.206 (7) 


1.257 (9) 


1 129 (1) 


1.179 (6) 


1.136 (3) 


1.177 (5) 


1.157 (4) 


1.133 (2) 




0.136(5) 


0.041 (3) 


0 010(1) 


0.121 (4) 


0.163 (7) 


0.068 (2) 


0.298 (9) 


0.221 (8) 


0.158(6) 


°> 


1.171 (7) 


1.173 (8) 


1.223 (9) 


1 .094 (2) 


1.138(6) 


1.105 (3) 


1.108 (5) 


1.106 (4) 


1 092 (1) 


ACC 


0.651 (8) 


0.656 (7) 


0.626 (9) 


0 699 (1) 


0.672 (6) 


0.696 (3) 


0.673 (5) 


0.684 (4) 


0.697 (2) 


TPR 


0 775 (1) 


0.741 (2) 


0.697 (5) 


0.711 (4) 


0.672 (7) 


0.729 (3) 


0.619 (9) 


0.659 (8) 


0.694 (6) 


FPR 


0.556 (9) 


0.486 (7) 


0.492 (8) 


0.320 (4) 


0.329 (5) 


0.360 (6) 


0 238 (1) 


0.274 (2) 


0.298 (3) 


SPC 


0.444 (9) 


0.514(7) 


0.508 (8) 


0.680 (4) 


0.671 (5) 


0.640 (6) 


0 762 (1) 


0.726 (2) 


0.702 (3) 


PPV 


0.700 (9) 


0.718(7) 


0.703 (8) 


0.787 (4) 


0.773 (5) 


0.771 (6) 


0 813 (1) 


0.800 (2) 


0.795 (3) 


NPV 


0.545 (7) 


0.544 (8) 


0.502 (9) 


0.585 (2) 


0.551 (5) 


0 586(1) 


0.545 (6) 


0.560 (4) 


0.579 (3) 


FDR 


0.300 (9) 


0.282 (7) 


0.297 (8) 


0.213 (4) 


0.227 (5) 


0.229 (6) 


0187 (1) 


0.200 (2) 


0.205 (3) 


MCC 


0.232 (8) 


0.259 (7) 


0.206 (9) 


0.381 (2) 


0.333 (6) 


0.363 (5) 


0.370 (4) 


0.372 (3) 


0 385 (1) 


F\ 


0.735 (4) 


0.729 (5) 


0.699 (9) 


0.747 (2) 


0.719(7) 


0750(1) 


0.703 (8) 


0.722 (6) 


0.741 (3) 


BER 


0.390 (8) 


0.372 (7) 


0.397 (9) 


0.305 (2) 


0.329 (6) 


0.316(5) 


0.309 (4) 


0.308 (3) 


0 302 (1) 


AUC 


0.610 (8) 


0.628 (7) 


0.603 (9) 


0.695 (2) 


0.671 (6) 


0.684 (5) 


0.691 (4) 


0.692 (3) 


0 698(1) 


K 


0.228 (8) 


0.258 (7) 


0.205 (9) 


0.378 (2) 


0.329 (6) 


0.362 (4) 


0.353 (5) 


0.363 (3) 


0380(1) 


Avg. 


7.29 


6.47 


8.06 


247 


5.88 


382 


4.59 


3.88 


253 



The data in boldface indicates the best ranked method of each row, with the exception of the last row, which indicates the best three classifiers. 



Table 5 Rank of green channel classifiers by performance analysis 





ANNi 


ANN 2 


ANN 3 


DA, 


DA 2 


SVMi 


SVM 2 


SVM 3 


SVM 4 


RMSE 


0.787 (4) 


0.791 (5) 


0.800 (7) 


0.780 (3) 


0.828 (8) 


0.796 (6) 


0.838 (9) 


0.706 (2) 


0 673 (1 


NRMSE 


0.810 (4) 


0.814(5) 


0.823 (7) 


0.802 (3) 


0.853 (8) 


0.819 (6) 


0.863 (9) 


0.727 (2) 


0 693 (1 


lA*el 


0.030 (3) 


0.025 (2) 


0 028 (1) 


0.075 (5) 


0.059 (4) 


0.107 (8) 


0.137(9) 


0.081 (7) 


0.078 (6) 


<Js 


0.788 (4) 


0.792 (6) 


0.801 (7) 


0.779 (3) 


0.829 (8) 


0.791 (5) 


0.830 (9) 


0.704 (2) 


0671 (1 


ACC 


0.845 (4) 


0.843 (5) 


0.839 (7) 


0.848 (3) 


0.828 (8) 


0.842 (6) 


0.824 (9) 


0.875 (2) 


0 887 (1 


TPR 


0 888(1) 


0.884 (2) 


0.883 (3) 


0.848 (6) 


0.839 (7) 


0.831 (8) 


0.805 (9) 


0.868 (5) 


0.878 (4) 


FPR 


0.227 (8) 


0.226 (7) 


0.233 (9) 


0.153 (5) 


0.189 (6) 


0.140 (3) 


0.143 (4) 


0.113(2) 


0 099 (1 


SPC 


0.773 (8) 


0.774 (7) 


0.767 (9) 


0.847 (5) 


0.81 1 (6) 


0.860 (3) 


0.857 (4) 


0.887 (2) 


0901 (1 


PPV 


0.867 (8) 


0.868 (7) 


0.864 (9) 


0.903 (5) 


0.881 (6) 


0.908 (3) 


0.904 (4) 


0.928 (2) 


0 937 (1 


NPV 


0.806 (2) 


0.802 (3) 


0.797 (5) 


0.770 (6) 


0.751 (8) 


0.753 (7) 


0.725 (9) 


0.801 (4) 


0816(1 


FDR 


0.133 (8) 


0.132 (7) 


0.136 (9) 


0.097 (5) 


0.119(6) 


0.092 (3) 


0.096 (4) 


0.072 (2) 


0 063 (1 


MCC 


0.667 (5) 


0.664 (6) 


0.656 (7) 


0.684 (3) 


0.641 (9) 


0.676 (4) 


0.645 (8) 


0.742 (2) 


0 766 (1 


Fi 


0.877 (3) 


0.876 (4) 


0.873 (6) 


0.875 (5) 


0.859 (8) 


0.868 (7) 


0.851 (9) 


0.897 (2) 


0 906 (1 


BER 


0.170 (6) 


0.171 (7) 


0.175 (8) 


0.152 (3) 


0.175 (9) 


0.155 (4) 


0.169 (5) 


0.123 (2) 


0110(1 


AUC 


0.830 (6) 


0.829 (7) 


0.825 (8) 


0.848 (3) 


0.825 (9) 


0.845 (4) 


0.831 (5) 


0.877 (2) 


0 890 (1 


K 


0.666 (5) 


0.663 (6) 


0.655 (7) 


0.682 (3) 


0.639 (8) 


0.672 (4) 


0.638 (9) 


0.739 (2) 


0 763 (1 


Avg. 


4.88 


5.35 


6.82 


406 


7.41 


5.12 


7.29 


259 


1 47 



The data in boldface indicates the best ranked method of each row, with the exception of the last row, which indicates the best three classifiers. 



Table 6 Rank of blue channel classifiers by performance analysis 





ANNi 


ANN 2 


ANN 3 


DA, 


DA 2 


SVM, 


SVM 2 


SVM 3 


SVM 4 


RMSE 


0.863 (8) 


0.858 (7) 


0.851 (6) 


0.827 (4) 


0.866 (9) 


0.803 (3) 


0.848 (5) 


0791 (1) 


0.792 (2) 


NRMSE 


0.888 (8) 


0.883 (7) 


0.876 (6) 


0.851 (4) 


0.891 (9) 


0.826 (3) 


0.873 (5) 


0814(1) 


0.815 (2) 


\l^e\ 


0.063 (9) 


0.058 (7) 


0.063 (8) 


0.024 (2) 


0.029 (4) 


0.043 (6) 


0.029 (3) 


0018(1) 


0.036 (5) 




0.862 (8) 


0.858 (7) 


0.851 (6) 


0.830 (4) 


0.868 (9) 


0.805 (3) 


0.851 (5) 


0 793 (1) 


0.794 (2) 


ACC 


0.813 (8) 


0.816 (7) 


0.818 (6) 


0.829 (4) 


0.813 (9) 


0.839 (3) 


0.820 (5) 


0 844 (1) 


0.843 (2) 


TPR 


0.876 (2) 


0.876 (3) 


0 880(1) 


0.853 (7) 


0.838 (9) 


0.854 (6) 


0.844 (8) 


0.868 (4) 


0.860 (5) 


FPR 


0.291 (9) 


0.284 (8) 


0.284 (7) 


0.212 (4) 


0.230 (6) 


0.186 (2) 


0.221 (5) 


0.197 (3) 


0186(1) 


SPC 


0.709 (9) 


0.716 (8) 


0.716(7) 


0.788 (4) 


0.770 (6) 


0.814(2) 


0.779 (5) 


0.803 (3) 


0814(1) 


PPV 


0.834 (9) 


0.838 (8) 


0.838 (7) 


0.870 (4) 


0.858 (6) 


0.884 (2) 


0.864 (5) 


0.880 (3) 


0 885 (1) 


NPV 


0.775 (5) 


0.776 (4) 


0.782 (2) 


0.763 (7) 


0.741 (9) 


0.770 (6) 


0.750 (8) 


0784(1) 


0.777 (3) 


FDR 


0.166 (9) 


0.162 (8) 


0.162 (7) 


0.130 (4) 


0.142 (6) 


0.116(2) 


0.136(5) 


0.120 (3) 


0115 (1) 


MCC 


0.597 (9) 


0.602 (8) 


0.608 (6) 


0.638 (4) 


0.604 (7) 


0.661 (3) 


0.619(5) 


0.668 (2) 


0 669(1) 


F\ 


0.854 (8) 


0.856 (6) 


0.858 (5) 


0.862 (4) 


0.848 (9) 


0.869 (3) 


0.854 (7) 


0874(1) 


0.873 (2) 


BER 


0.208 (9) 


0.204 (8) 


0.202 (7) 


0.179 (4) 


0.196 (6) 


0.166 (3) 


0.188 (5) 


0.165 (2) 


0163 (1) 


AUC 


0.792 (9) 


0.796 (8) 


0.798 (7) 


0.821 (4) 


0.804 (6) 


0.834 (3) 


0.812 (5) 


0.835 (2) 


0 837 (1) 


a: 


0.595 (9) 


0.600 (8) 


0.606 (6) 


0.637 (4) 


0.603 (7) 


0.660 (3) 


0.619(5) 


0.668 (2) 


0668(1) 


Avg. 


8.00 


7.00 


5.88 


4.24 


7.41 


329 


5.35 


1.88 


1.94 



The data in boldface indicates the best ranked method of each row, with the exception of the last row, which indicates the best three classifiers. 
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Table 7 Performance analysis of different methods of classifier combination 





Average 


Weighted avg. 


Majority 


Soft fusion 


RMSE 


0.682 ±0.021 (3) 


0.674 ±0.021 (2) 


0.705 ± 0.030(4) 


0.652 ±0.014(1) 


NRMSE 


0.702 ±0.021 (3) 


0.694 ±0.021 (2) 


0.725 ±0.031 (4) 


0671 ±0014(1) 




0 058 ±0 018(1) 


0.065 ±0.016(2) 


0.071 ± 0.023(3) 


0.1 14 ±0.008(4) 




0.682 ±0.021 (3) 


0.673 ±0.021 (2) 


0.703 ± 0.032(4) 


0 644 ±0 015(1) 


ACC 


0.876 ± 0.01 1 (3) 


0.876 ±0.01 1(3) 


0.876 ±0.01 1(3) 


0 881 ±0 011(1) 


TPR 


0.872 ± 0.009(3) 


0.872 ± 0.009(3) 


0.872 ± 0.009(3) 


0 878 ±0 008(1) 


FPR 


0.1 19 ± 0.026(3) 


0.1 19 ± 0.026(3) 


0.1 19 ± 0.026(3) 


01 14 ±0 024(1) 


SPC 


0.881 ± 0.026(3) 


0.881 ± 0.026(3) 


0.881 ± 0.026(3) 


0 886 ±0 024(1) 


PPV 


0.925 ±0.015(3) 


0.925 ±0.01 5(3) 


0.925 ±0.015(3) 


0 928 ±0 014(1) 


NPV 


0.805 ± 0.01 1 (3) 


0.805 ±0.01 1(3) 


0.805 ±0.01 1(3) 


0 813 ±0 011(1) 


FDR 


0.075 ±0.015(3) 


0.075 ±0.01 5(3) 


0.075 ±0.015(3) 


0.072 ±0.014(1) 


MCC 


0.742 ± 0.024(3) 


0.742 ± 0.024(3) 


0.742 ± 0.024(3) 


0 752 ±0 023(1) 


Fi 


0.898 ± 0.008(3) 


0.898 ± 0.008(3) 


0.898 ± 0.008(3) 


0 902 ±0 008(1) 


BER 


0.123 ±0.013(3) 


0.1 23 ±0.01 3(3) 


0.123 ±0.013(3) 


0.118 ±0.013(1) 


AUC 


0.891 ± 0.009(3) 


0.891 ± 0.009(2) 


0.877 ±0.013(4) 


0 918 ±0 007(1) 


K 


0.739 ± 0.024(3) 


0.739 ± 0.024(3) 


0.739 ± 0.024(3) 


0 750 ±0 023(1) 


Avg. SD 


0.0169 


0.0168 


0.0196 


00141 


Avg. Rank 


2.8824 


2.6471 


3.1176 


1 3529 



The data in boldface indicates the best ranked classification method of each row. 



soft fusion method in Table 7. In comparison we can 
notice that the proposed soft fusion of classifiers per- 
forms better only in terms of the RMSE, NRMSE, a E , 
and AUC. This means that the proposed scheme has bet- 
ter statistical stability, and that its relationship in terms 
of TPR and FPR demonstrates better performance. In 
all the remaining instances the SVM classifier with RBF 
kernel performs better than the soft fusion; arguably, 
because of the introduction of noise via red channel 
information. 

We continued by performing the well known Friedman's 
test and if the null-hypothesis were rejected we also per- 
formed the post-hoc Nemenyi's test [58]. First, Friedman's 
test determined that the results were statistically signifi- 
cant with p = 1.12 x 10~ 5 rejecting the null-hypothesis. 
The null-hypothesis being tested here is that the differ- 
ent approaches presented in the comparison of Table 7 
perform the same, and that their performance differ- 
ences are random. Then, since the null hypothesis was 
rejected it followed to perform the post hoc Nemenyi's 
test. We determined the critical difference (CD) for 
comparing four methods of combining classifiers using 
17 different performance metrics with a level of signi- 
ficance _a__f= 0.05. The result is the following: CD = 
2.569^/^ = 1.1376. Therefore, since the difference 
between the two best methods, i.e., Weighted Average and 
Soft Fusion, is greater than the CD, then we conclude that 
the Soft Fusion of classifiers performs significantly better 



than the other three methods in a statistical sense. That 
is, 2.6471 - 1.3529 = 1.2942 > 1.1376. Note that even 
when both the Soft Fusion and Weighted Average meth- 
ods take the importance of each classifier into account, 
still the proposed classification scheme is significantly 
better. 

Figure 4 depicts an analysis of the classification cer- 
tainty and uncertainty. This analysis is possible since the 
fuzzy integral (Equation 6) gives us the certainty that a 
classifier's ouptut yi belongs to one class or the other. 
From the upper part of Figure 4 we can observe how 
images in the threshold of being misclassified as leuko- 
coric or misclassified as healthy are extremely similar and, 
thus, difficult to classify. The lower part of Figure 4 illus- 
trates the problem when images are in the threshold of 
being correctly classified as healthy or leukocoric; here the 
problem seems to be related to the resolution of the orig- 
inal image. The lower the resolution the higher the risk 
of the image to be misclassified. Also the angle towards 
where the eye is gazing affects the classification to some 
degree. This is expected since the white reflection of the 
leukocoric eye is better observed when the eye is look- 
ing directly towards the camera and its source of light; the 
converse is also true and affects classification. Skin color 
and uneven illumination problems were reduced because 
of the image preprocessing explained earlier; however, 
experimental proof of this remains pending for further 
publications. 
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Class: healthy 




Class: leukocoria 




Degree of certainty of class healthy 



Degree of certainty of class leukocoria 



nigh 




Decree of uncertainty of class leukocoria 

u 



True class: leukocoria. Classified as: healthy 



True class: healthy. Classified as leukocoria 



Figure 4 Analysis of classification certainty and uncertainty as the eye images are classified as healthy or leukocoric. 



Conclusions 

The proposed classification scheme presented in this 
research uses a soft fusion of multichannel classifiers 
that are experts in detecting leukocoria in human eyes. 
These experts are trained with features extracted from 
RGB images preprocessed to overcome poor illumina- 
tion and skin color variation using the DCT, statistical 
normalization of the images, and the KLT 

This research uses nine different classifiers per chan- 
nel for a total of 27 experts. These include neural 
networks, linear discriminant classifiers, and support 
vector machines. The estimation of the fuzzy densities, 
a.k.a. importance of classifiers, was determined experi- 
mentally using cross-validation. The null-hypothesis was 
rejected and we demonstrated that the proposed classifi- 
cation scheme performs significantly better than the other 
approaches. Furthermore, it was shown that the green 
channel provides with more discriminant information 
than the other two. 

While a soft fusion of classifiers is a good alternative 
in the detection of leukocoria in eyes of infants, it is 
just one part of a larger program to identify leukocoria 
in natural images. Other areas of research include eye 
localization (to improve detection), age discrimination (to 
reduce false positives on adult subjects), and alternative 
learning-based methods for leukocoria detection [59,60]. 

Consent 

Written informed consent was obtained from the patient's 
parents for the publication of this report and any accom- 
panying images. 



Competing interests 

The authors declare that they have no competing interests. 
Authors' contributions 

PRP designed and performed the soft classification study. All authors were 
actively involved in the project. BFS provided the data. PRP, EB, and GH 
conducted the analysis. GH and EB revised early drafts of the manuscript. All 
authors commented on and approved the final version of the manuscript. 

Acknowledgements 

This work was supported in part by the National Council for Science and 
Technology (CONACyT), Mexico, under grant 1 93324/303732 provided to PRP, 
and a start-up fund provided to BFS by Baylor University. 

Author details 

1 Department of Computer Science, Baylor University, One Bear Place #97356, 
Waco.TX 76798-7356, USA. 2 Department of Chemistry & Biochemistry, Baylor 
University, One Bear Place #97348, Waco, TX 76798-7348, USA. 

Received: 31 March 2014 Accepted: 21 August 2014 
Published: 9 September 2014 

References 

1 Balmer A, Munier F: Leukocoria in the child: urgency and challenge. 

Klinische Monatsblatter Fur Augenheilkunde 1999, 214(5)332-335. 

2. Meire FM, Lafaut BA, Speleman F, Hanssens M: Isolated norrie disease in 
a female caused by a balanced translocation t(x,6). Ophthalmic Genet 
1998, 19(4)203-207. 

3. Meier P, Sterker I, Tegetmeyer H: Leucocoria in childhood. Klinische 
Monatsblatter Fur Augenheilkunde 2006, 223(6)521-527. 

4. Abramson DH, Beaverson K, Sangani P, Vora RA, Lee TC, Hochberg HM, 
Kirszrot J, Ranjithan M: Screening for retinoblastoma: presenting signs 
as prognosticators of patient and ocular survival. Pediatrics 2003, 

1 1 2(6 Pt 1):1 248-1 255. 
5 Phan I T, Stout T: Retinoblastoma presenting as strabismus and 
leukocoria. J Patient Saf 201 0, 1 57(5):858. 

6. Poulaki V, Mukai S: Retinoblastoma: genetics and pathology. Int 
Ophthalmol Clin 2009, 49(1 ):1 55-1 64. 

7. Rodriguez-Galindo C, Wilson MW, Chantada G, Fu L, Qaddoumi I, Antoneli 
C, Leal-Leal C, Sharma T, Barnoya M, Epelman S, Pizzarello L, Kane JR, 
Barfield R, Merchant TE, Robison LL, Murphree AL, Chevez-Barrios P, 



Rivas-Perea etal. BMC Ophthalmology 2014, 14:1 10 
http://www.biomedcentral.eom/1471-2415/14/1 10 



Page 14of 15 



Dyer MA, O'Brien J, Ribeiro RC, Hungerford J, Helveston EM, Haik BG, 
Wilimas J: Retinoblastoma: one world, one vision. Pediatrics 2008, 
122(3):763-770. 

8. Melamud A, Palekar R, Singh A: Retinoblastoma. Am Fam Physician 2006, 
73(6):1 039-1 044. 

9. Houston SK, Murray TG, Wolfe SQ, Fernandes CE: Current update on 
retinoblastoma. Int Ophthalmol Clin 2011, 51 (1):77-91. 

1 0. Abdolvahabi A, Taylor BW, Holden RL, Shaw EV, Kentsis A, 
Rodriguez-Galindo C, Mukai S, Shaw BF: Colorimetric and longitudinal 
analysis of leukocoria in recreational photographs of children with 
retinoblastoma. PloS one 201 3, 8(1 0):76677. 

doi:1 0.1 371 /journal.pone.0076677. 

1 1 Singman EL: Automating the assessment of visual dysfunction after 
traumatic brain injury. Med instrum 201 3, 1 (1 ):3. 

12 Khan AO, Al-Mesfer S: Lack of efficacy of dilated screening for 
retinoblastoma. J Pediatr Ophthalmol Strabismus 2005, 
42(4)205-102334. 

1 3. Li J, Coats DK, Fung D, Smith EO, Paysse E: The detection of simulated 
retinoblastoma by using red-reflex testing. Pediatrics 201 0, 

126(1):202-207. 

14. Marcou V, Vacherot B, El-Ayoubi M, Lescure S, Moriette G: [abnormal 
ocular findings in the nursery and in the first few weeks of life: a 
mandatory, yet difficult and neglected screening]. Arch Pediatr 2009, 
16(Suppl 1):38-41. 

1 5 Balmer A, Munier F: Differential diagnosis of leukocoria and 

strabismus, first presenting signs of retinoblastoma. Clin Ophthalmol 
2007, 1(4):43 1-439. 

16. Wallach M, Balmer A, Munier F, Houghton S, Pampallona S, von der Weid 
N, Beck Popovic M: Shorter time to diagnosis and improved stage at 
presentation in Swiss patients with retinoblastoma treated from 

1 963 to 2004. Pediatrics 2006, 1 1 8(5):1 493-1 498. 

17. Imhof SM, Moll AC, Schouten-van Meeteren AY: Stage of presentation 
and visual outcome of patients screened for familial 
retinoblastoma: nationwide registration in the netherlands. 

Br J Ophthalmol 2006, 90(7):875-878. 

18. Goddard AG, Kingston JE, Hungerford JL: Delay in diagnosis of 
retinoblastoma: risk factors and treatment outcome. Br J Ophthalmol 
1999, 83(1 2):1 320-1 323. 

1 9. Butros LJ, Abramson DH, Dunkel U: Delayed diagnosis of 
retinoblastoma: analysis of degree, cause, and potential 
consequences. Pediatrics 2002, 109(3)45. 

20. Shields CL, Shields JA: Retinoblastoma management: advances in 
enucleation, intravenous chemoreduction, and intra-arterial 
chemotherapy. Curr Opin Ophthalmol 201 0, 21 (3):203-21 2. 

2 1 Friedrich MJ: Retinoblastoma therapy delivers power of 
chemotherapy with surgical precision. JAMA : Jo Am Med Assoc 201 1, 
305(22)2276-2278. 

22 Cruz JA, Wishart DS: Applications of machine learning in cancer 
prediction and prognosis. Cancer Inform 2006, 2:59-77. 

23 Drier Y, Domany E: Do two machine-learning based prognostic 
signatures for breast cancer capture the same biological processes? 
PloS one 20 11, 6(3): 1-7. 

24 Kim S, Yoon S: Adaboost-based multiple svm-rfe for classification of 
mammograms in ddsm. BMC Med Inform Decis Making 2009, 9:1 -1 0. 

25. Doyle S, Feldman M, Tomaszewski J, Madabhushi A: A boosted bayesian 
multi-resolution classifier for prostate cancer detection from 
digitized needle biopsies. IEEE Trans Biomed Eng 2010, 59(5):1 205-1 21 8. 
doi:1 0.1 1 09/TBME.201 0.2053540. 

26. Zhou ZH, Jiang Y, Yang YB, Chen SF: Lung cancer cell identification 
based on artificial neural network ensembles. Artif Intel! Med 2002, 
24(1)25-36. 

27 Mango LJ: Computer-assisted cervical cancer screening using neural 
networks. Cancer Lett 1 994, 77(2-3):1 55-1 62. 

28. Ercal F, Chawla A, Stoecker VW, Lee HC, Moss RH: Neural network 
diagnosis of malignant melanoma from color images. IEEE Trans 
Biomed Eng 1 994, 41 (9):837-845. doi:1 0.1 1 09/1 0.31 2091 . 

29. Blum A, Luedtke H, Ellwanger U, Schwabe R, Rassner G, Garbe C: Digital 
image analysis for diagnosis of cutaneous melanoma, development 
of a highly effective computer algorithm based on analysis of 837 
melanocytic lesions. Br J Dermatol 2004, 151 (5):1 029-1 038. 

doi:1 0.1 1 1 l/j.1 365-21 33.2004.0621 0.x. 



30. Ganster H, Pinz A, Rohrer R, Wildling E, Binder M, Kittler H: Automated 
melanoma recognition. IEEE Trans Med Imaging 2001, 20(3):233-239. 
doi:1 0.1 109/42.918473. 

31 . Garcia-Uribe A, Kehtarnavaz N, Marguez G, Prieto V, Duvic M, Wang LV: 
Skin cancer detection by spectroscopic oblique-incidence 

ref lectometry: classification and physiological origins. Appl Opt 2004, 
43(13):2643-2650. 

32 Viola P, Jones M: Rapid object detection using a boosted cascade of 

simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 
200! Proceedings ofthe200! IEEE Computer Society Conference On Volume 
I. Piscataway: IEEE; 2001:511-5181. 

33 Cho S B, Kim JH: Multiple network fusion using fuzzy logic. Neural 
Netw IEEE Trans 1 995, 6(2):497-501 . 

34 Cho S B, Kim JH: Combining multiple neural networks by fuzzy 
integral for robust classification. SystMan Cybernet IEEE Trans 1 995, 
25(2):380-384. 

35. Abdallah ACB, Frigui H, Gader P: Adaptive local fusion with fuzzy 

integrals. Fuzzy Syst IEEE Trans 201 2, 20(5):849-864. 
36 Linda O, Manic M: Interval type-2 fuzzy voter design for fault tolerant 

systems. InfSci 201 1, 181(14)2933-2950. 

37. Wang D, Keller JM, Carson CA, McAdo-Edwards KK, Bailey CW: Use of 
fuzzy-logic-inspired features to improve bacterial recognition 
through classifier fusion. Sysf Man Cybernet Part B: Cybernet IEEE Trans 
1998, 28(4)583-591. 

38. Gader PD, Mohamed MA, Keller JM: Fusion of handwritten word 
classifiers. Pattern RecognitLett 1996, 17(6):577-584. 

39 Wang Y, Wu J: Fuzzy integrating multiple svm classifiers and its 

application in credit scoring. In Machine Learning and Cybernetics, 2006 
International Conference On. Piscataway: !EEE; 2006:3621-3626. 

40. Benediktsson JA, Sveinsson JR, Ingimundarson Jl, Sigurdsson HS, Ersoy 
OK: Multistage classifiers optimized by neural networks and genetic 
algorithms. Nonlinear Anal: Theory Methods Appl 1997, 30(3):1 323-1 334. 

41 . Du S, Shehata M, Badawy W: A novel algorithm for illumination 
invariant dct-based face recognition. In Electrical Computer Engineering 
(CCFCE), 20 12 25th IEEE Canadian Conference On. Piscataway: IEEE; 
2012:1-4. 

42. Najim M: Modeling, Estimation and Optimal Filtering in Signal Processing. 
Chap. Karhunen Loeve Transform. London: Wiley - ISTE; 2010:335-340. 

43. Hua Y, Liu W: Generalized karhunen-loeve transform. Signal Process 
LettlEFE 1998, S(6):1 41 -1 42. 

44. Kuncheva LI, Bezdek JC, Duin RPW: Decision templates for multiple 
classifier fusion: an experimental comparison. Pattern Recognit 2001, 
34(2):299-314. 

45. Kittler J, Hatef M, Duin RPW, Matas J: On combining classifiers. Paffem 
Anal Mach Intel! IEEE Trans 1 998, 20(3):226-239. 

46 Jordan Ml, Xu L: Convergence results for the em approach to 

mixtures of experts architectures. Neural Netw 1 995, 8(9):1 409-1 43 1 . 

47. Benediktsson JA, Swain PH: Consensus theoretic classification 
methods. SystMan Cybernet IEEE Trans 1 992, 22(4):688-704. 

48. Sugeno M: Fuzzy measures and fuzzy integrals: a survey. Fuzzy 
Automata Decis Process 1 977, 78(33):89-l 02. 

49. Chacon Ml, Rivas-Perea P: Performance analysis of the feedforward 
and som neural networks in the face recognition problem. In IEEE 
Symposium on Computational Intelligence in Image and Signal Processing, 
2007. CUSP 2007 Hawaii, USA. Piscataway: IEEE; 2007:31 3-31 8. 

50. Cristianini N, Scholkopf B: Support vector machines and kernel 
methods: the new generation of learning machines. Ai Magazine 
2002, 23(3):31. 

51 . Haykin SS: Neural Networks and Learning Machines. Upper Saddle River: 
Pearson Education; 2009. 

52. Rivas-Perea P, Cota-Ruiz J, Rosiles J-G: A nonlinear least squares 
quasi-newton strategy for Ip-svr hyper-parameters selection. Int J 
Mach Learn Cybernet 201 3, 5(4):579-597. 

53. Yang J, Frangi AF, Yang J-Y, Zhang D, Jin Z: Kpca plus Ida: a complete 
kernel fisher discriminant framework for feature extraction and 
recognition. Paffem Anal Mach Intell IEEE Trans 2005, 27(2)230-244. 

54. Frigyik BA, Gupta MR: Bounds on the bayes error given moments. 
Inf Theory IEEE Trans 201 2, 58(6):3606-361 2. 

55 Cawley GC: Leave-one-out cross-validation based model selection 
criteria for weighted Is-svms. In Neural Networks, 2006. IJCNN'06. 
International Joint Conference On. Piscataway: IEEE; 2006:1 661 -1 668. 



Rivas-Perea etal. BMC Ophthalmology 2014, 14:1 10 
http://www.biomedcentral.eom/1 471 -241 5/14/110 



Page 15 of 15 



56 Fawcett T: Roc graphs: notes and practical considerations for 
researchers. Mach Learn 2004, 31:1 -38. 

57 Carletta J: Assessing agreement on classification tasks: the kappa 
statistic. Comput Linguist 1 996, 22(2)249-254. 

58 Demsar J: Statistical comparisons of classifiers over multiple data 
sets. J Mach Learn Res 2006, 7:1 -30. 

59. Henning R, Rivas-Perea P, Shaw B, Hamerly G: A convolutional neural 
network approach for classifying leukocoria. In Image Analysis and 
Interpretation (SSIAI) 2014 IEEE Southwest Symposium On. Piscataway: IEEE; 
201 4:9-1 2. doi:l 0.1 1 09/SSIAL201 4.680601 6. 

60. Rivas-Perea P, Henning R, Shaw B, Hamerly G: Finding the smallest circle 
containing the iris in the denoised wavelet domain. In Image Analysis 
and Interpretation (SSIAI) 2014 IEEE Southwest Symposium On. Piscataway: 
IEEE; 201 4:doi:l 0.1 1 09/SSIAI.201 4.680601 7. 

f \ 
doi:10.1 186/1471-2415-14-1 10 

Cite this article as: Rivas-Perea etal:. Detection of leukocoria using a soft 
fusion of expert classifiers under non-clinical settings. BMC Ophthalmology 
2014 14:110. 

V ' 



Submit your next manuscript to BioMed Central 
and take full advantage of: 

• Convenient online submission 

• Thorough peer review 

• No space constraints or color figure charges 

• Immediate publication on acceptance 

• Inclusion in PubMed, CAS, Scopus and Google Scholar 

• Research which is freely available for redistribution 



Submit your manuscript at S~ \ n - n-| r , 

www.biomedcentral.com/submit \ J B' 01 ™*" central 



